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Abstract 

Background: Genetic engineering of industrial microorganisms often suffers from undesirable side effects on 
essential functions. Reverse engineering is an alternative strategy to improve multifactorial traits like low glycerol/ 
high ethanol yield in yeast fermentation. Previous rational engineering of this trait always affected essential 
functions like growth and stress tolerance. We have screened Saccharomyces cerevisiae biodiversity for specific 
alleles causing lower glycerol/higher ethanol yield, assuming higher compatibility with normal cellular functionality. 
Previous work identified 5j/f/f^^oN...K356N causative allele in strain CBS6412, which displayed the lowest glycerol/ 
ethanol ratio. 

Results: We have now identified a unique segregant, 26B, that shows similar low glycerol/high ethanol production 



as the superior parent, but lacks the ssl< 



,E330N...K356N 



allele. Using segregants from the backcross of 26B with the 



inferior parent strain, we applied pooled-segregant whole-genome sequence analysis and identified three minor 
quantitative trait loci (QTLs) linked to low glycerol/high ethanol production. Within these QTLs, we identified three 



novel alleles of known regulatory and structural genes of glycerol metabolism, smpi 



Rl WQ,P269Q 



llOti 



PW7S,H274Y 



and 



gpd 



,U64P 



as causative genes. All three genes separately caused a significant drop in the glycerol/ethanol production 



ratio, while gpc//'"'*'"' appeared to be epistatically suppressed by other alleles in the superior parent. The order of 
potency in reducing the glycerol/ethanol ratio of the three alleles was: gpdi^^^'^'' > liotl'''"^^''''^^'*'^ > smpl'^'^°°-''^'^'^°. 

Conclusions: Our results show that natural yeast strains harbor multiple specific alleles of genes controlling 
essential functions, that are apparently compatible with survival in the natural environment. These newly identified 
alleles can be used as gene tools for engineering industrial yeast strains with multiple subtle changes, minimizing 
the risk of negatively affecting other essential functions. The gene tools act at the transcriptional, regulatory or 
structural gene level, distributing the impact over multiple targets and thus further minimizing possible side-effects. 
In addition, the results suggest polygenic analysis of complex traits as a promising new avenue to identify novel 
components involved in cellular functions, including those important in industrial applications. 
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Introduction 

Rational genetic modification of industrial microorgan- 
isms using targeted deletion and/or overexpression of 
structural or regulatory genes very often results in un- 
desirable side effects on other essential functions [1-8]. 
This has severely compromised the development of new 
superior industrial microorganisms. Glycerol yield in 
Saccharomyces cerevisiae is a complex genetic trait with 
great industrial importance. Low glycerol production is es- 
sential for maximal yield in bioethanol production [5,9,10] 
while a high glycerol yield and a reduced ethanol yield are 
positive traits in wine production [11-14]. Rational genetic 
engineering of glycerol yield by modification of the main 
structural gene, GPDl, encoding glycerol 3-phosphate 
dehydrogenase (GPDH), the rate limiting enzyme of the 
glycerol biosynthesis pathway, has not been successful in 
obtaining appropriate industrial yeast strains with a modi- 
fied glycerol/ethanol ratio due to the negative side-effects 
on other phenotypic traits. Deletion and even reduced 
expression of GPDl lowers growth and fermentation rates 
[2-5] while overexpression causes redox imbalance and 
overproduction of acetate and other by-products [1]. 
Genetic analysis of natural S. cerevisiae strains exhibiting 
an inherent glycerol yield significandy different from that 
of the industrial yeast strains to be improved, offers a 
promising strategy to identify mutant alleles suitable as 
gene tools for engineering glycerol production to obtain 
lower or higher yield, without causing negative side-effects 
on other essential traits. 

Glycerol production is of great physiological importance 
in S. cerevisiae. Besides CO2, glycerol is the main quantita- 
tively important side-product of yeast ethanolic fermenta- 
tion. It is synthesized from dihydroxyacetone phosphate 
(DHAP) by the consecutive action of glycerol 3-phosphate 
dehydrogenase, encoded by the isogenes GPDl and 
GPD2, and glycerol 3-phosphate phosphatase, encoded by 
the isogenes GPPl and GPP2 [3,15-17]. The first step of 
glycerol formation is accompanied by the oxidation of 
NADH + to NAD*. One important cellular function of 
glycerol formation is to regenerate NAD* during anaer- 
obic growth in order to maintain the cytosolic redox 
balance. This is crucial since intermediates from the lower 
part of glycolysis are withdrawn for multiple biosynthetic 
pathways. As a result, some of the NADH + H* generated 
upstream in glycolysis cannot be regenerated through 
ethanol formation. Glycerol formation is also essential 
during osmostress where it serves as the major compatible 
osmolyte. The high osmolarity glycerol (HOG) pathway 
plays an important role in the stimulation of glycerol pro- 
duction during osmostress and has been elucidated and 
characterised in great detail [18]. It involves osmosensing 
proteins at the level of the plasma membrane, a MAP kin- 
ase signaling pathway and transcription factors and other 
target proteins, that regulate glycerol production and 



intracellular accumulation. Both physiological functions of 
glycerol formation, i.e. redox balancing and coping with 
osmostress, are important during industrial ethanol pro- 
duction due to anaerobic conditions and high sugar con- 
centrations (osmotic pressure) at the beginning of the 
process. 

Glycerol production is a complex quantitative trait and 
glycerol yield was shown to be highly variable within the 
species 5. cerevisiae [19]. This intraspecies diversity pro- 
vides a promising starting point to understand and 
engineer the genetic basis for a low glycerol yield in in- 
dustrial strains of S. cerevisiae. Pooled-segregant whole- 
genome sequencing has been developed as an efficient 
method to map quantitative trait loci (QTLs) involved in 
complex traits [20-24] and reciprocal hemizygosity ana- 
lysis to identify the causative genes in the QTLs [25]. 
Random inbreeding of segregants combined with pheno- 
typic selection can be used to increase the recombin- 
ation frequency, making the QTLs smaller and thus 
facilitating identification of the causative genes [23]. In 
this case, millions of segregants were used and submit- 
ted to phenotypic selection, which enabled identification 
of many minor QTLs and the causative genes within 
these QTLs. However, this strategy only works for se- 
lectable traits. Most industrially relevant complex traits 
are non-selectable and phenotyping such large numbers 
of segregants is not feasible in practice. Hence, it re- 
mains highly important to develop alternative method- 
ologies for analyzing minor QTLs in an efficient and 
reliable way that are applicable to hundreds instead of 
thousands or millions of segregants. Reliable identifica- 
tion and analysis of minor QTLs and their causative 
genes is challenging because they show only weak link- 
age and their contribution to the phenotype is easily 
overruled by major causative genes and/or can be re- 
placed by other minor causative genes. One strategy to 
identify minor QTLs consists of replacing in the super- 
ior parent the causative alleles identified in major QTLs 
by the corresponding inferior alleles from the control 
parent strain. The resulting downgraded superior strain 
is then crossed again with the control parent strain [26]. 
Similarly, major QTLs were eliminated by targeted 
backcrossing to reveal minor QTLs [27,28]. A disadvan- 
tage of this strategy is that the phenotypic difference be- 
tween the parent strains becomes less obvious and that 
therefore larger numbers of segregants may be required 
for reliable phenotyping and QTL mapping. Another 
strategy to identify minor QTLs is to increase the strin- 
gency of phenotypic screening. Swinnen et al. [20] 
showed that selection of yeast segregants tolerant to 
17% ethanol versus 16% ethanol, strengthened the link- 
age of several minor QTLs, facilitating their further ana- 
lysis. However, this methodology also requires higher 
numbers of segregants to be phenotyped. 
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In the present paper, we present a novel approach to 
identify minor QTLs, which does not suffer from the 
drawbacks that the phenotypic difference between the par- 
ent strains becomes smaller or that the number of segre- 
gants required for the screening increases. We have 
screened the Fl segregants for the combined presence of 
the superior trait and absence of a major causative gene 
previously identified. Only one such segregant could be 
identified, which was then used in a backcross with the 
inferior parent strain. We demonstrate that the segregants 
from this cross can be successfully used to map minor 
QTLs, of which we validated several by identifying the 
causative genes. This approach was applied to the non- 
selectable phenotype of low glycerol/high ethanol produc- 
tion in yeast fermentation, for which we previously 
identified gskf^^"^-''^^^^'^ as a major causative allele [19]. 
A backcross with the single segregant displaying low gly- 
cerol yield and lacking the ^s/^j^^aojv. -fosew ^jj^j^ 

identification of three new minor QTLs, in which we iden- 
tified as causative genes specific alleles of known genes in 
glycerol metabolism and its regulation, each causing a 
reduction of glycerol yield. 

Results 

Selection of a rare segregant displaying the trait of low 
glycerol/high ethanol yield and lacking the major 
causative allele sskl^"''^ - ''^^^'' 

Previous work has identified the 5. cerevisiae strain 
CBS6412 as a strain with an unusually low ratio of gly- 
cerol/ethanol yield and genetic analysis identified the 
^^l^jE330N...K3S6N ^^jj^jg ^ ^^^^^ causative gene [19] 

(Figure la). In order to identify the minor QTLs and their 
causative genes responsible for determining this complex 
trait, we have first screened all superior segregants with a 
glycerol/ethanol ratio as low as the superior parent strain, 
for a segregant that lacked the g^j^^i^^^o^- '^^^^^ allele. 
Among the 44 superior segregants available, only a single 
such segregant, 26B, was present. Its glycerol yield was 
equally low and its ethanol yield equally high as the super- 
ior parent strain CBS4C, both in minimal medium with 
5% glucose and in rich YP medium with 10% glucose 
(Figure lb). Hence, 26B showed the same phenotypic 
difference with the inferior parent strain ER7A as CBS4C 
(Figure lb). 

Backcross of the unique superior segregant 26B with the 
inferior parent ER7A and screening for superior 
segregants 

We next switched the mating type of 26B from Mata to 
Mata (see Materials and methods) and crossed the Mata 
26B strain with the Mata inferior parent strain, ER7A, 
which is a derivative of the industrial strain Ethanol Red, 
currently used worldwide in bioethanol production. The 
hybrid diploid ER7A/26B showed a glycerol/ethanol 
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Figure 1 Phenotypes of the parental strains ER7A and CBS4C 
and the segregant 26B. (a) Scheme of the crossings to map 
mutations linked to the low glycerol yield phenotype. The initial 
parental cross of ER7A and CBS4C resulted in the segregant 26B 
with a low glycerol phenotype but without the jj/j/f^J''"- -''^^''" 
allele. The 26B segregant was crossed back with the inferior parent 
ER7A to find other linked mutations, (b) Glycerol and ethanol yield 
(on glucose) obtained in minimal medium with 5% glucose and in 
YP 10% glucose for the parental strains, ER7A and CBS4C, the 
segregant 26B, and the hybrid diploid 26B/ER7A. Three independent 
fermentations were performed with each strain. 



yield phenotype, which was intermediate between that of 
ER7A and 26B (Figure lb). The hybrid was sporulated 
and 260 meiotic segregants were screened for low gly- 
cerol yield (and corresponding higher ethanol produc- 
tion) in 100 ml fermentations with YP 10% glucose. The 
parent strains 26B and ER7A, and the hybrid diploid, 
were used as controls in each batch of fermentations. 

Glycerol and ethanol yield of the segregants in each 
batch were normalized to those of 26B, which were set 
to 100%. ER7A and the diploid 26B/ER7A showed an 
average glycerol yield of 146% and 124% and a decreased 



Hubmann ef al. Biotechnology for Biofuels 2013, 6:87 
http://www.biotechnologyforbiofuels.eom/content/6/1/87 



Page 4 of 1 7 



ethanol yield of 98.1% and 99.4% (Figure 2a). The gly- 
cerol and ethanol yield of the segregants showed a 
Gaussian distribution, which extended over the range 
of the two parental strains. In the case of the lowest 
glycerol yield, this extension was only marginal. The 
population means of the glycerol yield (123%) and 
ethanol yield (98.8%) were close to those of the dip- 
loid 26B/ER7A. In general, glycerol and ethanol yield 
of the segregant population correlated inversely (as 
determined with a Pearson test), meaning that low 
glycerol yield was usually accompanied by high etha- 
nol yield. Nearly all exceptions to this correlation were 
segregants with an unusually low ethanol yield that 
failed to show a correspondingly higher glycerol yield. 
To compose the pool of selected superior segregants, two 
cut-off criteria were defined, a glycerol yield lower than 
120% of 26B and an ethanol yield higher than 99% of 26B. 
These cut-off criteria resulted in the selection of a set of 
34 superior segregants. These were all retested in 100 ml 
fermentations with YP 10% glucose and 22 segregants 
showed again a low glycerol yield combined with a corres- 
pondingly higher ethanol yield using the same cut-off 
criteria (Figure 2b). These 22 segregants were selected for 
QTL mapping with pooled-segregant whole-genome se- 
quence analysis. A second pool with 22 randomly selected 
segregants was also subjected to pooled-segregant whole- 
genome sequence analysis and referred to as the unse- 
lected control pool (Figure 2b). 



Pooled-segregant whole-genome sequence analysis and 
QTL mapping 

The genomic DNA of the selected and unselected pools, 
as well as the parent strain 26B, was extracted and 
submitted to custom sequence analysis using lUumina 
HiSeq 2000 technology (BGI, Hong Kong, China). The 
genome sequence of the parent strain ER7A has been 
determined in our previous study (data accession num- 
ber SRA054394) [19]. Read mapping and single nucleo- 
tide polymorphism (SNP) filtering were carried out as 
described previously [20,29]. The SNP variant frequency 
was plotted against the SNP chromosomal position 
(Figure 3). Of the total number of 21,818 SNPs between 
CBS4C and ER7A, 5,596 SNPs of CBS4C were found 
back in 26B. These SNPs were used for mapping minor 
QTLs in the genomic areas that were not identical be- 
tween 26B and ER7A. The other genomic areas were 
completely devoid of SNPs because they were identical 
between the 26B and ER7A parents (white gaps in 
Figure 3). The scattered raw SNP variant frequencies 
were smoothened and a confidence interval was calcu- 
lated, as previously described [20,29]. The Hidden 
Markow Model, EXPloRA (see Materials and methods) 
was used to evaluate whether candidate regions showed 
significant linkage to the low glycerol phenotype. 
EXPloRA indicated six significant QTLs: on chr. I 
(3859-11045), chr. II (584232-619637), chr. IV 
(316389-375978 and 696486-748140), and chr. XIII 
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Figure 2 Glycerol and ethanol yield (on glucose) in parental strains, hybrid diploid and segregants. (a) Glycerol and ethanol yield (on 
glucose) in the parental strains, 26B (■) and ER7A (A), the hybrid diploid strain 25B/ER7A (•) and in segregants of 25B/ER7A (o). For screening 
purposes, one fermentation was carried out for each strain in 100 ml YP with 10% glucose. Glycerol and ethanol yields of all segregants, ER7A 
and the diploid 26B/ER7A were related to the yield of 26B, which was set as 100%. (b,c) Distribution of the glycerol and ethanol yield (on 
glucose) in the unselected (b) and selected (c) segregant pool of 26B/ER7A. The criteria for selection of "low glycerol" segregants (<120% glycerol 
yield, >99% ethanol yield) are indicated with stippled lines. The values of the 22 selected segregants are the average of three replicates. These 
segregants were used for pooled-segregant whole-genome sequence analysis. The glycerol and ethanol yield of the parental strains, 26B and 
ER7A, and diploid 26B/ER7A are indicated as in (a). 
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Figure 3 Plots of SNP variant frequency versus chromosomal position and corresponding probability of linkage to the superior or 
inferior parent. Plots of SNP variant frequency versus chromosomal position in all 16 yeast chromosomes for the selected (raw data: light grey 
triangles; smoothed data: red line) and unselected pool (raw data: light grey circles; smoothed data: green line). Significant upward deviations 
from the average of 0.5 indicate linkage to the superior parent 25B, while significant downward deviations indicate linkage to the inferior parent 
ER7A. The smoothed line was determined as described previously [20,29]. Linked regions were detected with EXPLoRA (Duitama et al. 
in preparation). 



(600902-610995 and 634582-640415) for the selected 
segregants pool. 

The locus on chr. I was present in both the selected 
and unselected pool and was thus likely linked to an 
inadvertently selected trait, such as sporulation cap- 
acity or spore viability. It was excluded from further 
analysis. EXPloRA also reported two significantly 



linked loci on chr. VI (169586-170209) and chr. VII 
(472620-493523) for the unselected pool. Both loci 
were linked to the inferior parent, ER7A. For the re- 
gion on chr. VII, the linked locus with the inferior 
parent genome was also present in the selected pool. 
Both loci likely represent linkage to inadvertently se- 
lected traits, such as sporulation capacity or spore 
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viability. It is unclear why the locus on chr. VI was 
only present in the unselected pool. Since both loci 
were not linked to the low glycerol phenotype they 
were not investigated further. 

The locus on chr. II was interesting since it also 
appeared in the previous mapping with the two original 
parents, CBS4C and ER7A, but in that case it was not 
pronounced enough to be significant [19]. The mapping 
with the backcross has now confirmed the relevance of 
this locus. On chr. IV and XIII, two new QTLs with a 
significant linkage to the low glycerol/high ethanol yield 
phenotype were detected. These QTLs were not present 
in our previous mapping with the original parent strains 
CBS4C and ER7A. 

All QTLs with a significant link to the phenotype under 
study, i.e. those on chr. II, IV and XIII, were further inves- 
tigated in detail. Selected SNPs within the respective 
QTLs were scored in the 22 individual superior segregants 
to determine precisely the SNP variant frequency and the 
statistical significance of the linkage. Using the binomial 
test previously described [20,29] none of the three loci 
was found to be significantly linked to the genome of the 
superior parent strain 26B with the low number of super- 
ior segregants available. Therefore, we isolated 400 
additional Fl segregants of the diploid 26B/ER7A and 
screened them for low glycerol/high ethanol production. 
In addition, we performed four rounds of random in- 
breeding (mating and sporulation) with all Fl segregants 
from the diploid 26B/ER7A to increase the recombination 
frequency [23] and subsequently also evaluated 400 F5 
segregants in small-scale fermentations for glycerol/etha- 
nol yield. The results for the 400 Fl and 400 F5 segregants 
are shown in Figure 4a. The glycerol and ethanol yields 
are again expressed as percentage of that of the superior 
parent strain 26B. There was again a clear inverse correl- 
ation between glycerol and ethanol yield. From the 800 
segregants, we selected in total 48 superior segregants, i.e. 
22 Fl segregants and 26 F5 segregants (Figure 4b). 

We next scored selected SNPs in the putative QTLs 
on chr. II, IV and XIII in the 22 additionally selected Fl 
segregants and the 26 selected F5 segregants. Next, we 
determined the SNP variant frequency and the corre- 
sponding P-value, as described previously [20,29], for the 
following groups of segregants: the 22 initially selected 
segregants of the sequenced pool, the 22 additionally se- 
lected Fl segregants, the total of 44 selected Fl segre- 
gants, the 26 selected F5 segregants and the total of 70 
selected Fl and F5 segregants. They are shown in 
Figure 4c. By increasing the number of superior segre- 
gants, we were now able to demonstrate significant 
linkage (P-value < 0.05) to the genome of the superior 
parent strain 26B for the three QTLs under study. For 
the QTLs on chr. II and IV the linkage was very strong, 
while for the QTL on chr. XIII it was still weak, but 



significant. In contrast, the second region on chr. IV did 
not show any significant linkage with none of the pools. 

Identification of causative genes in the QTLs on chr. II, IV 
and XIII 

For further analysis, we first selected three potential can- 
didate genes within the three QTLs, based on their 
known function in glycerol metabolism. SMPl, which is 
located in the QTL on chr. II (594,864 to 593,506 bp), 
encodes a putative transcription factor involved in regu- 
lating glycerol production during the response to 
osmostress [30]. The gene is located in the chromosomal 
region from 584,232 to 619,637 bp, which was predicted 
as most significant by the EXPloRA model. The 26B 
SMPl allele has two point mutations within its coding 
sequence, which are changing the primary protein se- 
quence at position 110 from arginine to glutamine and 
at position 269 from proline to glutamine. Hence, we 
have named this allele smpl'^""^'^^''''^. 

The SNP with the highest linkage within the QTL 
found on chr. IV, was located at position 411,831 bp 
(Figure 4c), which is within the open reading frame of 
GPDl (411,825 to 413,000 bp). This is the structural 
gene for the NAD^-dependent cytosolic GPDH [15,16]. 
This enzyme catalyzes the conversion of DHAP to gly- 
cerol 3-phosphate through the oxidation of NADH and 
has been shown to be the rate-controlling step in gly- 
cerol formation [1,16]. The GPDl allele of 26B harbors a 
point mutation, changing leucine at position 164 into 
proline. This mutation was found before (DDBJ database 
data, accession number AY598965). The GPDl allele of 
26B was named gpdl^^^^^. 

The SNP with the highest linkage within the QTL 
found on chr. XIII was located at position 606,166 bp 
(Figure 4c), which is within the open reading frame of 
HOTl (605,981 to 608,140 bp). HOTl encodes a tran- 
scription factor required for the response to osmotic 
stress of glycerol biosynthetic genes, including GPDl, 
and other HOG-pathway regulated genes [31,32]. The 
26B HOTl allele contains two non-synonymous point 
mutations, changing proline to serine at position 107 
and histidine to tyrosine at position 274. We have 
named the HOTl allele of 26B, hotf^°^'^-"^^^^. 

We first investigated the effect of smpf""'^'''^^'^^, 
gpdl^^^^ and hotl^^^'^^'^^^'^'^on the low glycerol/high 
ethanol phenotype using reciprocal hemizygosity analysis 
(RHA) [25]. For that purpose, we constructed for each 
gene a pair of hemizygous diploid 26B/ER7A hybrid 
strains, in which each pair contained a single copy of the 
superior or the inferior allele of SMPl, GPDl or HOTl, 
respectively, while the other copy of the gene was de- 
leted. The three pairs of hemizygous diploids were tested 
in the same 100 ml YP 10% glucose fermentations as 
previously used for the screening. The parent strains 
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Figure 4 Linkage analysis of QTLs on chr. II, IV and XIII with different groups of segregants. (a) Glycerol and ethanol yield (on glucose) of 
the parental strains, 26B (■) and ER7A (A), and the hybrid diploid strain 26B/ER7A (•). Glycerol and ethanol yield of the first isolated F1 
segregants from 25B/ER7A (o), of the additional F1 segregants (□) and of the F5 segregants (0). For screening purposes, one fermentation was 
carried out in 5 ml YP 10% glucose. Glycerol and ethanol yield of all segregants, ER7A and the diploid 25B/ER7A were related to the yield of 25B, 
which was set as 100% (b) Segregants were selected for low glycerol (<120% glycerol yield, stippled line) and high ethanol {>99% ethanol yield, 
stippled line) yield (on glucose) after each round of screening, resulting in the following segregant groups: 22 Fl segregants used for pooled- 
segregant whole-genome sequence analysis (o), 22 additional selected Fl segregants (n), and 25 F5 segregants (0). These segregants were 
reconfirmed in 100 ml YP 10% glucose. Values for glycerol and ethanol yield are the average of three replicates, (c) SNP variant frequency (top) 
and respective P-value (bottom) were determined by allele-specific PGR in individual segregants of the sequenced selected pool (•), additional Fl 
selected pool (o), the total Fl selection of 44 (A), the selection of F5 segregants C^), and the total selection of all 70 segregants (■) to fine-map 
the QTLs on chr. II, IV and XIII, which were detected with EXPIoRA. The statistical confidence line (for P-value < 0.05) is indicated with a 
stippled line. 



26B and ER7A and the hybrid diploid 26B/ER7A were 
added as controls. The glycerol and ethanol yields were 
again expressed as percentage of those of 26B, which 
were set at 100%. The significance of any differences be- 
tween the strains was evaluated using a two-tailed un- 
paired i-test with a P-value < 0.05 considered to indicate 
a significant difference. The results of the RHA are 



shown in Figure 5. They indicate that both smpl''^^'"^' 
^^^'Q and hotl''"''''"''''', but not gpdl'-''^'', derived from 
the superior parent 26B cause a significant drop in the 
glycerol/ethanol ratio compared to the alleles of the in- 
ferior parent strain ER7A. For smpl^""'^'''^'^'^^ only the 
reduction in glycerol, and not the increase in ethanol, 
was significant with the P-value < 0.05 used. These 
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Figure 5 Reciprocal hemizygosity analysis (RHA). RHA for the candidate genes, SMP1 (chr. II), GPD1 (chr. IV), and H0T1 (chr. XIII) to 
evaluate them as causative genes in the QTLs. For RHA, diploid strains were constructed with either the deletion of the ER7A allele or the 
deletion of the 25B allele. Glycerol and ethanol yield (on glucose) of the two hemizygous diploid strains were related to the parental strain 26B. 
The Student r-test was used to confirnn significant differences in glycerol and ethanol yield for the two diploids and is indicated with *. Each 
strain construct was tested in triplicate. 



results indicate that smpl^'^"'^'^^^^'^ is probably a causa- 
tive gene in the QTL on chr. II. They do not exclude 
that the QTL may contain a second causative gene, 
especially since smpl^^'"'^'^^^^'^ is not located in the re- 
gion with the strongest linkage (lowest P-value). 

The RHA with the GPDl alleles failed to show any 
difference both for glycerol and ethanol production 
(Figure 5). Hence, the superior character of the 
gpdl^"^^^ allele could not be confirmed with RHA. This 
is remarkable because the SNP with the strongest link- 
age (lowest P-value) in the QTL on chr. IV was located 
in the open reading frame of GPDl and showed very 
strong linkage to the low glycerol/high ethanol pheno- 
type. The hotl''^'^^'^''^^''^^ allele of the superior strain 26B, 
in contrast, caused a reduction in glycerol and an in- 
crease in ethanol production, and both changes were 
significant (P-value < 0.05) (Figure 5). Hence, these re- 
sults indicate that hotl''^''^^''^^^^^ is a causative allele in 
the QTL on chr. XIII and because it contains the SNP 
with the strongest linkage (lowest P-value), it is likely 
the main causative allele in this QTL. 

The glycerol yield for the inferior parent ER7A and 
the diploid 26B/ER7A were on average 143% and 126% 
of the 26B yield (Figure 5). Ethanol yield of both strains 
was correspondingly reduced to 98% of the 26B yield. 
Clearly, the smpl and hotl alleles 

can only be responsible for part of the difference in the 
glycerol/ethanol ratio between the parent strains. The 
same was found previously for the sskl allele 
[19]. This confirms that the glycerol/ethanol ratio in 



yeast fermentation is a true polygenic, complex trait, de- 
termined by an interplay of multiple mutant genes. 

Expression of the gpd/'''*'"' allele from 26B in haplold 
gpdll^ strains reveals its superior character 

Several explanations could account for the failure to 
confirm the superior character of the gpdl allele 
from 26B in the RHA test. A closely located gene may 
be the real causative gene, the gpdl^^^^^ allele may be 
effective only in a haploid genetic background or the ef- 
fect of the gpdl^^^^'' allele may be suppressed through 
epistasis by one or both of the other two superior alleles, 
smpl""""^-"^'''''^ and hotl"°''-"^'^'' . To distinguish be- 
tween these possibilities, we amplified the gpdl^^^'^^ 
allele from strain CBS4C and the GPDl allele from 
strain ER7A by PGR (410,523 to 413,479 bp, including 
promotor, ORE and terminator). The PGR fragment was 
ligated in the centromeric plasmid YGplac33, resulting 
in plasmids YGplac33/gpdl^'"'''^-GBS4G and YGplac33/ 
GPD1-ER7A. Both plasmids were transformed into 
gpdlA strains of the two parents 26B and ER7A, the hy- 
brid diploid 26B/ER7A and the lab strain BY4742 
[33,34]. All strains were tested in 100ml fermentations 
with YP 10% glucose. Glycerol and ethanol yields were 
determined after 120 h of fermentation. The results are 
shown in Figure 6. 

When the gpdl^^^'*^-CBS4C allele or the GPD1-ER7A 
allele were expressed in the gpdl A strains of the superior 
parent 26B or the hybrid diploid 26B/ER7A, the increase 
in glycerol production and the decrease in ethanol 
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Figure 6 Expression of gpdj" -CBS4C and GPD7-ER7A in segregant 26B, ER7A, the diploid 26B/ER7A and BY4742. Glycerol and 
ethanol yield (on glucose) in the gpdlA strains, 26B, ER7A, 26B/ER7A and BY4742, harboring the plasmids YCplac33, YCplac33 GPD1-ER7A, and 
YCplac33 gpdl'"^^'"'-CBS4C. Fermentations were carried out in 100 ml YP 10% glucose. Each strain construa was tested in triplicate. Glycerol and 
ethanol yield of the strains were related to the yield of 26B, which was set at 1 00%. In the BY4742 and ER7A backgrounds, which lack the 
smpf""^'™^ and hotf"''^'^^''"''^ alleles, the gpdf^'"' allele clearly reduced glycerol yield and concomitantly increased ethanol yield compared 
to the wild type GPDl allele. In the strains 26B and 26B/ER7A, which contain the smpl""°°-''^''^° and hot f'"^^'"""'^ alleles, the gpd;^"^"'' allele 
resulted in a similar glycerol yield as the wild type GPDl allele. 



production was the same for the two alleles. On the 
other hand, expression of the gpdl^^^*'' -CBS4C allele in 
the gpdlA strains of the inferior parent ER7A or the lab 
strain BY4742, enhanced glycerol production and 
reduced ethanol production significantly more than ex- 
pression of the GPD1-ER7A allele. The latter shows that 
the gpdl^^'^^^-CBS4C allele is superior compared to the 
GPD1-ER7A allele. The difference between the two 
alleles is apparently not dependent on the haploid or 
diploid background of the strain but seems to be related 
with the presence of the two other superior alleles, 

,R110Q,P269Q J ; ,-,P107S,H274Y r^,, , 

smpl and not! . ihey are both 

present in the two strains, 26B and 26B/ER7A, in which 
gpdl -CBS4C has no differential effect and absent in 
the two strains, ER7A and BY4742, in which gpdl''"'*''- 
CBS4C has a differential effect. Hence, the superior po- 
tency of gpdl^^^^-CBS4C may be suppressed through 
epistasis by smpl and/or notl . On the 

other hand, we cannot exclude that the effect of 
gpdl^^'^^^-CBS4C is suppressed by one or more other mu- 
tant genes present in the superior parent 26B or the hy- 
brid diploid 26B/ER7A. 

We have scored the final 70 superior segregants with 
a glycerol yield < 120% and an ethanol yield > 99% of 
that of the superior parent 26B, for the presence of the 
three causative alleles, smpl^""^-''^'^'"^, gpdl^^''^'" and 
hotl^'"^^'^^'''^^. The results are shown in Figure 7a. The 
largest group of superior segregants contained all three 



mutant alleles, followed by smaller groups with only two 
of the three mutant alleles and finally the three smallest 
groups with only one mutant allele. Hence, there was a 
clear correlation between the number of mutant alleles 
and low glycerol/high ethanol yield in this group of se- 
lected segregants. On the other hand, although there 
was a tendency for a lower mean glycerol/ ethanol yield 
ratio with an increasing number of mutant alleles, the 
differences between the means of the different groups 
were small and the variation remained large and with 
the same range for the three largest categories. 

We have also investigated a possible correlation be- 
tween the different mutant alleles and the strength of 
the low glycerol/high ethanol phenotype. For that pur- 
pose, we determined the percentage of segregants with a 
specific mutant allele in sets of strains with a decreasing 
glycerol yield or an increasing ethanol yield. The results 
show that there is no preference between the different 
alleles in the strains with a higher glycerol yield, but in 
the strains with the lowest glycerol yield, the gpdl^^^'^^ 
allele is preferentially present, followed by the hotl 

allele, although this only holds for the category 
with the lowest glycerol yield (Figure 7b). Hence, the 
order of potency of the three alleles appears to be: 

,,L164P 1 .tP107S,H274Y ^ ,RI10Q,P269Q r^,, 

gpdl > notl > smpl . i here was 

no correlation between the variant frequency of the 
three alleles for high ethanol yield, indicating that other 
minor QTLs may affect ethanol yield independently 
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Figure 7 Distribution of the gpdj'^'*"', hoti'' 



and smpi 
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alleles in the selected low glycerol/high ethanol segregants. 



(a) Glycerol and ethanol yield (on glucose) in segregants with different combinations of the superior alleles, gpdl , hotl 



P10/S,H2/4Y 



and 



, in the selected segregant pool. The mean value of the glycerol and ethanol yield is indicated for each group, (b) Variant 



frequency of gpdf"''"' (•), hot f (A) and smpf"'^'''^^^° (o) in the 70 selected segregants, which were categorized according to 

decreasing glycerol yield and increasing ethanol yield. Glycerol yield was divided into nine bins, each with a similar number of strains, starting 
from <96.5 and with a bin width of 2.3%. Accordingly, ethanol yield was divided into seven bins, each with a similar number of strains, starting 
from <99.16 and with a bin width of 0.185%. The number of segregants in each bin is indicated in the lower panel. 



from glycerol yield and act together with the currently 
identified alleles. 

Discussion 

Identification of superior alleles as gene tools for 
reduction of the glycerol/ethanol yield ratio 

The goal of the present work was to investigate whether 
natural yeast strains may harbor specific alleles, e.g. in 
structural and/or regulatory genes of glycerol metabol- 
ism, that would allow reducing the glycerol yield and in- 
creasing the ethanol yield in yeast fermentation without 
causing negative side-effects on other essential functions. 



We successfully identified three mutant alleles, which 
separately and together reduce the glycerol yield in a 
subtle way without affecting, at least not in a conspicu- 
ous way, the overall rate and characteristics of the 
fermentation process. Combined with the previous dis- 
covery of the 55/<-j^'33o^--'<3S67v' ^jjgjg strain CBS4C 

[19], this indicates that the original diploid parent strain 
CBS6412 contains at least four specific alleles causing 
reduced glycerol production and concomitantly higher 
ethanol production. This suggests that a low glycerol/ 
ethanol yield ratio may have been advantageous for the 
survival of the S. cerevisiae strain CBS6412 in nature. 
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Hence, we have now identified four mutant alleles acting 
at different levels, either transcriptional, regulatory or 
structural, in the biosynthesis of glycerol and its regula- 
tion, and that have been pre-filtered by natural selection 
and evolution for compatibility with survival in the nat- 
ural environment. This spreads the reduction of glycerol 
yield not only over different target genes, each causing a 
subtle reduction, but also over different nodes in the 
cellular network and thus likely minimizes further the 
risk of negative side-effects. 

For the phenotypic screening, we primarily selected on 
low glycerol yield. Although there was in general an in- 
verse relationship between glycerol and ethanol yield, 
this was not true in all strains. Moreover, the variant fre- 
quency of the three mutant alleles for low glycerol yield 
and high ethanol yield (Figure 7b) did not match. In the 
segregants with the lowest glycerol yield, as opposed to 
those with higher glycerol yield, the gpdl^^^'^'' allele was 
preferentially present, but this difference was not ob- 
served between segregants with higher and lower etha- 
nol yield. This suggests that there are other factors that 
determine high ethanol yield independent of glycerol 
yield or that it is the combination of the mutant alleles 
that is required to enhance ethanol yield rather than the 
presence of specific alleles. The previously identified 

, ,E330N...K3S6N n i u- U i • 1 

major sskl allele, which was present in nearly 

all Fl segregants with low glycerol/ethanol ratio [19], 
might also have played a more important role in 
establishing high ethanol yield than the three currently 
identified minor causative alleles. 

Our results therefore confirm that natural yeast strains 
harbor mutant alleles of the well known structural and 
regulatory genes identified in laboratory yeast strains that 
have been filtered by natural selection and evolution for 
compatibility with survival in the natural environment. 
The chance that these alleles exert significant negative 
effects on other essential functions of the yeast cells is 
probably not completely absent but at least minimized 
compared to drastic genetic modifications like gene dele- 
tion or overexpression. Screening of biodiversity for such 
specific alleles therefore appears to be a fruitful strategy to 
identify mutant alleles that can be used as specific gene 
tools for strain improvement by targeted genetic modifica- 
tion. A related example is the development of S. cerevisiae 
wine strains with a higher glycerol/lower ethanol produc- 
tion ratio [12-14]. Higher glycerol production is a pre- 
ferred characteristic because it improves the mouthfeel of 
wine and lower ethanol production is a preferred charac- 
teristic as well because of the restrictions imposed on 
ethanol consumption for driving. Our results with the 
many segregants tested for glycerol/ethanol yield in fer- 
mentation has revealed many strains with a much higher 
glycerol and much lower ethanol yield than the parent 
strains (Figures 2, 4). This suggests that also for this 



beneficial trait specific alleles could be identified that 
would allow to improve this trait in a more subtle way. 
Possibly, this would allow to enhance glycerol production 
and reduce ethanol production without the dramatic in- 
crease in undesirable acetate production that was the re- 
sult of classical genetic engineering of structural genes for 
GPDH [1]. 

Identification of minor QTLs and causative genes 

While identification of major QTLs has become straight- 
forward with pooled-segregant whole-genome sequence 
analysis [19,20,22,23], identification of minor QTLs 
remains a major challenge. This is especially true for 
phenotypes that require a high workload for scoring and 
for which as a result only low numbers of selected segre- 
gants can be obtained to assemble the pool for whole- 
genome sequence analysis. In the present paper we have 
successfully established a novel approach for minor QTL 
identification. After mapping major QTLs and identifica- 
tion of the causative genes, the Fl segregants displaying 
the phenotype-of-interest are screened for absence of 
one or more superior alleles. As a result, these segre- 
gants should have all or most minor QTLs able to confer 
the phenotype-of-interest. This is not only because they 
are needed to confer the phenotype in the absence of 
the superior allele, but also because their effect is often 
suppressed by a major superior allele through epistasis. 
Hence, the use of a rare segregant displaying the trait- 
of-interest, might be advantageous in case of gene inter- 
ference. To display the trait-of-interest, the segregant 
must have inherited a set of compatible mutant alleles, 
with any interfering mutations being absent. When such 
a rare Fl segregant is backcrossed with the inferior 
parent, the segregants from this cross displaying the 
phenotype-of-interest, should again contain all or most 
of these minor QTLs facilitating their mapping and 
identification of their causative allele. As shown in 
Figure 7a, several of these segregants contained only 
one of the three causative alleles and still displayed a 
low glycerol yield under the cut-off of 120%. This sug- 
gests that there are additional alleles present in these 
strains able to confer low glycerol yield and that a new 
backcross of such a segregant with the inferior parent 
may allow identification of additional alleles conferring 
low glycerol yield. In principle this approach could thus 
be repeated with each new generation of segregants. 
Previously, similar approaches have been used in which 
either Fl segregants were backcrossed to eliminate a 
major QTL [27] or major QTLs were fixed in one of the 
parents and the crossing repeated [26]. In these cases 
the parents displayed a reduced phenotypic difference, 
which may make the phenotyping and the stringency of 
selection in the next cross more cumbersome. 
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Random coincidence versus linl<age in small segregant 
populations 

In small populations of segregants, random coincidence 
can easily cause falsely predicted QTLs, which are diffi- 
cult to distinguish from QTLs with significant, but weak 
linkage [35]. In this case, the unselected pool is of little 
use because it usually does not contain at the same pos- 
ition the same false QTL caused by random coincidence. 
Higher stringency in QTL selection can eliminate false 
QTLs but also weakly linked true QTLs. An essential 
difference between a false QTL caused by random coin- 
cidence and a true QTL with weak linkage, is that the 
latter should be reproducible. Therefore, we screened 
three different pools of segregants resulting in three in- 
dependent pools of small-size with segregants displaying 
low glycerol yield. This allowed us to distinguish the 
false QTLs on chr. IV (696486-748140) and chr. XIII 
(634582-640415) from the true, weakly linked QTLs on 
the same chromosomes, chr. IV (316389-375978) and 
chr. XIII (600902-610995). It is well known that a 
higher number of segregants increases the reliability of 
minor QTL detection. In previous work, many minor 
QTLs could be identified by using millions of segregants 
and a selectable phenotype [23]. However, most complex 
traits are not selectable and phenotypic screening of mil- 
lions of segregants is not feasible for many traits. In 
these cases, reliable minor QTL identification remains a 
major challenge. In addition, the number of segregants 
that can be pooled for pooled-segregant whole genome 
sequence analysis is in principle unlimited, but in prac- 
tice the useful number of segregants is limited by the 
average coverage in whole-genome sequencing. When 
the number of segregants exceeds the average coverage 
of sequencing, the surplus no longer enhances the reli- 
ability of mapping and is thus useless. 

Novel mutant alleles and possible epistatic interactions 

Smpl is a transcription factor, belonging to the MEF2 
family, that regulates the expression of stress-responsive 
genes, such as GPDl. Its DNA binding domain is located 
at the amino acid residues 1-90 [36]. Upon osmotic 
stress, Smpl is phosphorylated by Hogl, which physic- 
ally interacts with its C-terminal domain. Four different 
phosphorylation sites were identified, i.e. Ser348, Ser357, 
Thr365, and Ser376, all located within a region coinci- 
dent with the Hogl binding domain. Phosphorylation of 
Smpl is essential for its functioning, since an allele 
unable to be phosphorylated caused an impaired stress 
response [30]. The point mutations in the 26B allele, 
smpl'^^'^'^''^^^^'^, are not located in the DNA or the 
Hogl binding domain. However, the change of a pro- 
line to a glycine, close to the phosphorylation sites, 
might change Smpl structure, thereby influencing its 
ability to be bound and/or phosphorylated by Hogl. 



The smpl^^""^'^^^^'^ allele is dominant since its ex- 
pression in the hybrid 26B/ER7A diploid decreased 
glycerol yield and increased ethanol yield. 

GPDl encodes NAD^-dependent cytosolic glycerol 3- 
phosphate dehydrogenase. It catalyzes the conversion of 
DHAP to glycerol 3-phosphate through the oxidation of 
NADH. The expression of GPDl is induced by the HOG 
pathway and it is essential for growth under high osmolar- 
ity [16] Possible domains for binding of NADH, H^ and 
DHAP have been predicted based on similarity with pro- 
teins with a comparable function [37]. The single point 
mutation present in the 26B allele, gpdl^^^^'^, may be 
located in the putative NADH-binding domain, but the lo- 
cation of this domain is not well predicted. This mutation 
was found earlier and called a 'natural variant' (DDBJ 
database data, accession number AY598965). No linkage 
of gpdl^^^'^^ with low glycerol yield was observed in RHA 
but its effect was revealed by expression in a GPDl- 
deficient mutant. The L164P mutation could reduce the 
intrinsic activity of the Gpdl enzyme or its expression by 
lowering GPDl mRNA stability. Both possibilities would 
result in reduced glycerol production and thus explain the 
low glycerol yield of CBS6412. Both explanations are con- 
sistent with the lower GPDH activity that we measured in 
CBS4C (0.128 ± 0.027 U/mg protein) compared to ER7A 
(0.225 ± 0.053 U/mg protein). 

The gpdl^^^^'^ allele was apparently subject to epistatic 
suppression in the superior strains. In the BY4742 and 
ER7A backgrounds, which lack the smpl'^^'"'^'^^^^^ and 
/jofJ^^"''^'^^''^^ alleles, the gpdl^^^^^ allele had a clear 
reducing effect on glycerol yield compared to the wild 
type GPDl allele (Figure 6). On the other hand, its ex- 
pression in the strains 26B and 26B/ER7A, which con- 
tain the smpl''"'"^'''''"'^ and hotl''"''''"^'^'' alleles, 
resulted in a similar glycerol yield as the wild type GPDl 
allele. This suggests that the smpl'^^^'^'^'^^'^^'^ and 
hotl ' alleles and/or other alleles present in the 
superior strains suppress the effect of the gpdl^^^^^ al- 
lele. The epistatic effect may be explained at the bio- 
chemical level by the fact that the reduction in 
expression of GPDl, caused by the smpl^^'"^'^^^^'^ and 
hotl''^''''^''^^''^^ alleles, is so strong that the mutation in 
GPDl itself has no significant effect anymore. Hotl acti- 
vates transcription of GPDl and other HOG-dependent 
genes under osmostress [31,32,38]. Alepuz et al. [31] 
proposed that Hotl serves as an anchor for Hogl, which 
directly recruits the RNA polymerase II complex. The 
position of the Hogl binding domain in Hotl is un- 
known. It is unclear how the two mutations in the 26B 
allele, hotl ' , could affect the functioning of the 
protein. 

Interestingly, although all mutant genes revealed in 
this work had already been identified previously using 
classical molecular genetics approaches, our work 
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clearly indicates the potential of complex trait analysis 
for identifying new alleles of known components, and 
possibly also completely new components, in signaling 
pathways and other cellular functions. In our case, it 
seems plausible that continuation of backcrossing with 
new segregants from subsequent generations, dis- 
playing low glycerol/high ethanol yield and lacking 
(most of) the previously identified alleles, might reveal 
new players in the HOG signaling pathway or in tran- 
scriptional control of GPDl. 

Conclusions 

This work has shown that yeast biodiversity harbors 
multiple mutant alleles of genes in glycerol biosynthesis 
and its regulation that can be used to lower the glycerol 
yield in bioethanol production. Since these are natural 
alleles that cause subtle changes and that act at different 
levels in glycerol biosynthesis and its regulation, their 
use minimizes the risk of negative side-effects on other 
industrially important properties, as is often seen with 



drastic alterations of structural and regulatory genes by 
genetic modification. 



Materials and methods 

Microbial strains, cultivation conditions and plasmids 

All S. cerevisiae strains used are listed in Table 1. 
Yeast strains were grown in 1% yeast extract, 2% 
peptone media (YP) with glucose as carbon source in 
the indicated concentration. E. coli strain DH5a^^ 
(Invitrogen Corp., Carlsbad) was used for amplification 
of plasmids. The strain was grown in Luria-Bertani 
(LB) medium containing 0.5% (w/v) yeast extract, 1% 
(w/v) Bacto tryptone, 1% (w/v) NaCl, (pH 7.5) at 37°C. 
E. coli transformation and isolation of plasmid DNA 
was carried out using standard techniques [39]. Trans- 
formants were selected on LB medium containing 
100 |ig/ml ampicillin. The plasmids used are shown in 
Table 2. 



Table 1 Saccharomyces cerevisiae strains used 



Strain 



Genotype 



Source 



CBS6412 
Ethanol Red 
ER7A 
CBS4C 
25B 

26B Mata 

26B/ER7A 

26B smp1A/ER7A 

26B/ER7A smplA 

25B gpd1A/ER7A 

26B/ER7Agpd1A 

25B hotlA/ER7A 

25B/ER7A hotlA 

BY4742 gpdIA YCplacBB 

BY4742 gpdIA YCplac33 GPD1-ER7A 

BY4742 gpdIA YCplac33 gpdf '"''-CBS4C 

26B gpdIA YCplac33 

25B gpdlA YCplac33 GPD1-ER7A 

26B gpdlA YCplac33 gpdl'-"''"'-CBS4C 

ER7AgpdlAYCplac33 

ER7A gpdIA YCplac33 GPD1-ER7A 

ER7A gpdIA YCplac33 gpdl'-''^*-CBS4C 

25B/ER7A gpdIA/A YCplac33 

25B/ER7A gpdIA/A YCplac33 GPD1-ER7A 



E330N...K356N 



26B/ER7A gpdIA/A YCplac33 gpdT 



-CBS4C 



Diploid, 55/<;™-'<^«755/<;' 

Diploid, S5K1/5SK1 

Segregant 7A of Ethanol Red, Mata 

Segregant 4C of CBS6412, Mata, 55^^/™ -^^^n 

Segregant of the cross ER7A x CBS4C, Mata, S5K1 

Mating type switch of 26B to Mata 

Hybrid diploid 26B/ER7A 

Hybrid diploid 26B smplA/ER7A 

Hybrid diploid 26B/ER7A smplA 

Hybrid diploid 26B gpd1A/ER7A 

Hybrid diploid 26B/ER7A gpdIA 

Hybrid diploid 26B hot1A/ER7A 

Hybrid diploid 26B/ER7A hotIA 

Haploid, gpdlA, YCplac33 

Haploid, gpdlA, YCplac33 GPD1-ER7A 

Haploid, gpdlA YCplac33 gpdf''""'-CBS4C 

Haploid, uraSA gpdlA YCplac33 

Haploid, um3A gpdlA YCplac33 GPD1-ER7A 

Haploid, ura3A gpdlA YCplac33 gpdf"^*'' -CBS4C 

Haploid, uraSA gpdlA YCplac33 

Haploid, um3A gpdlA YCplac33 GPD1-ER7A 

Haploid, ura3A gpdlA YCplac33 gpdf"""" -CBS4C 

Diploid, uraSm, gpdim, YCplac33 

Diploid, uraSA^^, gpdW^, YCplac33 GPD1-ER7A 

Diploid, uraSm, gpdWIl, YCplac33 gpdl"''*'' -GBS4C 



CBS-KNAW 

Fermentis, S. 1. Lesaffre 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 

This study 
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Table 2 Plasmids used 



Plasmid 


Description 


Reference 


pUG6 


E. co///vector containing, Amp*, ioxP-KanMX6-loxP disruption cassette 


[46] 


pUG66 


E. co///vector containing, Amp^ loxP-ble''-loxP disruption cassette 


[46] 


pFL39 GALl HO KanMX 


vector containing HO gene 


Lab stod< 


YCplac33 


yeast shuttle vector, URA3 


Lab 5tod< 


YCplac33/GPD1-ER7A 


yeast shuttle vector, URA3 GPD1-ER7A 


This work 


YCplac33/gpdf^''-CBS4C 


yeast shuttle vector URA3, gpdl^'^'"' 


This work 



Mating, mating type switch, sporulation and internal 
crosses 

Mating and sporulation were carried out according to 
standard procedures [40]. Ivlating type of segregants was 
determined by diagnostic PGR for the MAT locus [41]. 
Mating type switching was performed by induction of the 
HO-gene expressed from the plasmid pFL39 GALl HO 
KanMX. Meiotic spores of the hybrid 26B/ER7A were iso- 
lated by random spore analysis [42] . The hybrid 26B/ER7A 
was plated on a sporulation plate for a period of about two 
weeks up to 1 month until asci were observed. The yeast 
cells were washed off the sporulation plate and suspended 
in 25 ml of MilliQ water, in a 300 ml Erlenmeyer flask to- 
gether with sterile 0.45 mm glass beads. 500 i^l Zymolyase 
(10 mg/ml) and 10 |il of P-mercaptoethanol were added to 
the cell suspension in order to degrade the asci. This cell 
suspension was incubated overnight at 30°C by shaking at 
200 rpm. The cell suspension was transferred to a 50 ml 
tube together with the glass beads, and shaken vigorously. 
The cell debris was centrifuged at 20000 rpm for 20 min. 
The supernatant was discarded and the pellet was sus- 
pended in 5 ml of Nonidet P-40 and placed on ice for 15 
min. This was followed by 4 rounds of sonication (30 s, 
75%). The cell suspension rested 2 min on ice between two 
rounds. After that, the cell suspension was centrifuged for 
10 min at 3000 rpm, the supernatant was discarded, and 
the cells were re-suspended in 1.5% Nonidet P-40. This 
procedure was repeated once, followed by 4 more rounds 
of sonication and incubation on ice. Lastly, the cell suspen- 
sion was centrifuged for 10 min at 3000 rpm. The super- 
natant was discarded and cells were re-suspended in 300 
ml of MilliQ water. The cell suspension was diluted to 
obtain single colonies on plates. Plates were incubated at 
30°C until single colonies were visible. These single col- 
onies were re-plated and checked for mating type to 
confirm haploidy. Usually, this procedure yielded 90% hap- 
loids. Crossing of 26B and ER7A was carried out as 
follows. First, diploids were isolated from the cross of 
ER7A and 26B. For this purpose, the mating type of single 
colonies resulting from the cross between ER7A and 26B 
was checked. In the first step of internal crossing, the dip- 
loids were incubated on a sporulation plate until sufficient 
asci were visible. In the second step, spores were isolated 



using random spore analysis (see further). The isolated 
spores were all plated on YD and incubated for 2 days at 
30°C to ensure that enough diploids had been formed. In 
the last step, newly formed diploids were transferred to a 
new sporulation plate to start the next cycle of internal 
crossing. 

Fermentation conditions 

The 26B, ER7A and CBS4C strains were tested in two 
fermentations: minimal medium and YP 10% glucose. 
The minimal medium was composed of 1.9 g Y yeast 
nitrogen base (Difco), 5 g ammonium sulphate, 250 
mg leucine, 50 mg l '^ uracil, 100 mg 1'^ histidine, 30 
mg r lysine, 20 mg T methionine and 50 g V glucose. 
The inoculum culture was grown overnight in minimal 
medium containing 2% [w/v] glucose and was used for 
inoculation of the fermentation medium at an initial OD 
of 1. Fermentations were carried out in Erlenmeyer 
flasks, which were equipped with air locks, ensuring the 
exclusion of oxygen but allowing the release of CO2. 
The fermentations were performed at 30°C and cultures 
were continuously stirred at 200 rpm. 

The parental strains, ER7A and 26B, as well as CBS4C, 
were additionally tested in YPD, which contained 0.2% 
[w/v] yeast extract, 0.6% [w/v] peptone, and 10% [w/v] 
glucose, to mimick the free amino nitrogen content 
present in wheat liquefact. The fermentations were car- 
ried out in cylindrical glass tubes, which were closed 
with a rubber stopper containing a glass pipe, sealed off 
with a cotton plug to release CO2. 100 ml fermentation 
medium was added to each tube. Inoculum cultures 
were grown statically overnight at 30°C in 5 ml of YD 
medium one day ahead of the fermentation, and this cul- 
ture was used completely to inoculate the 100 ml fer- 
mentation tube. The empty weight, starting weight and 
weight after 72 h fermentation of the tubes was measured 
to determine the net weight loss of the medium during 
the fermentation. Fermentations ran for 72 h at 30°C and 
were stirred at 200 rpm. After 72 h, the final weight was 
determined. The fermentation broth was cooled for 1 
night at 4°C prior to the analysis of glycerol and ethanol 
concentrations, in order to minimize evaporation of 
ethanol during the sample taking. The fermentations of 
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the initial screen of 26B/ER7A segregants, those of the 
reciprocal hemizygosity analysis, and those of the gpdlA 
strains were carried out in 100 ml YP 10% glucose. 

Screening of additional selected segregants was down- 
scaled to 5 ml fermentation cultures. The pre-culture was 
started one day ahead of the fermentation in 3 ml of YD 
medium. Cultures were grown statically overnight at 30°C. 
The next day, the pre-cultures were used to inoculate the 
5 ml fermentation in a proportion of 1/20. Fermentations 
were kept for 96 h at 30°C and afterwards placed at 4°C 
overnight prior to the analysis of the fermentation broth 
in order to prevent ethanol evaporation. 

Determination of fermentation parameters 

In all fermentations weight loss was used to follow the pro- 
gress of the fermentation. Glucose, glycerol and ethanol in 
the medium were determined by HPLC (Waters" isocratic 
Breeze™ HPLC, ion exchange column WAT010290). 
Column temperature was 75°C, 5 mM H2SO4 was used as 
eluent with a flow rate of 1 ml min'^ and refractive index 
detection was used (Waters, 2414 RI detector). The prod- 
uct yield was calculated from the final product concentra- 
tion (g. r^) and the difference in glucose concentration at 
the start and end of the fermentation (consumed glucose 
in g. r ). Yields of strains used for screening, RHA and the 
gpdlA complementation analysis were related to the yield 
of 26B in the same experiment in order to decrease vari- 
ance between different experiments. 
Yields were calculated with the following formulas: 

Absolute yield: Ys/i= ^^,„,^^„„,^ 
Relative yield: 7% = J& 

's/i 

DNA methods 

Yeast genomic DNA was extracted with Phenol/Chloro- 
form/Isoamyl-alcohol (25:24:1) [43] and further purified 
with diethyl-ether extraction or ethanol precipitation if 
required. PCR was performed with high-fidelity polymer- 
ases Phusion^'^ (Finnzymes) or ExTaq^'^ (TaKaRa) for 
cloning and amplification of deletion or insertion cas- 
settes, and sequencing purposes. Sequencing was carried 
out using the dideoxy chain-termination method [44] at 
the VIB Genetic Service Facility (Antwerp). The sequences 
were analyzed with geneious (Geneious Basic 5.3.4), 
SeqMan (Lasergene Coresuite 8) or CLC DNA workbench 
(CLC bio) software. 

Pooled-segregant whole-genome sequence analysis 

The segregant 26 was isolated from the cross of CBS4C 
and ER7A. This segregant was backcrossed with its own 
parent ER7A. From this backcross, the 22 most superior 
segregants (lowest glycerol production) were assembled 
in the 'selected pool' while 22 random segregants were 



used to assemble the 'unselected pool'. The two pools 
were made by combining equal amounts of cells based 
on OD500. High molecular weight DNA (3 i^g, ~ 20kb 
fragments) was isolated from the pools and parent 
strains according to Johnston and Aust [45]. The purity 
of the DNA sample was estimated from UV measure- 
ment (260/280 = 1.7-2.0). The DNA samples were pro- 
vided to BGI (Hong Kong, China) for whole-genome 
sequence analysis by Illumina technology. 

Mapping of short read sequences, variant calling and 
QTL analysis were carried out as described earlier by 
Swinnen et al. [20] and by Hubmann et al. [19]. The 
SNP variant frequencies were calculated by dividing the 
number of the alternative variant by the total number of 
aligned reads. A very high or a very low frequency was a 
sign of a one-sided SNP segregation preferentially com- 
ing from one parent, indicating a genetic linkage to the 
trait of interest. Genetic linkage was statistically con- 
firmed using EXPloRA (Duitama et al. in preparation) or 
the methods described earlier [20]. 

Detection of SNP markers 

Individual SNPs were scored by PCR. The forward and 
reverse primer contained the nucleotide of ER7A or 
CBS4C as the 3' terminal nucleotide, respectively. The 
annealing temperature was optimized using DNA of 
ER7A and CBS4C so as to allow only hybridization with 
primers containing an exact match. 

Reciprocal hemizygosity analysis (RHA) 

For RHA analysis [25], two diploid strains were con- 
structed by crossing 26B and ER7A wild type or deletion 
strains for the candidate gene, so that the resulting dip- 
loids only contained a single allele from either 26B or 
ER7A for the candidate gene being evaluated. Deletion 
cassettes for SMPl, HOTl and GPDl were constructed as 
described by Gueldner et al. [46] using the phleomycin 
resistance marker ble'. After the transformation, the gene 
deletions were verified by PCR. RHA was performed with 
three independent isolates of all tested diploids. 

Construction of YCplac GPDl plasmids 

The primers, A-3709 and A-3743, were used to PCR 
amplify genomic DNA of CBS4C and ER7A chr. IV 
(410523 - 413479), containing the promoter, GPDl 
ORE, and terminator. The resulting PCR fragment was 
digested with Kpnl, purified, and ligated to the plasmid 
YCplac33, which was digested with Kpnl prior to 
ligation. The constructs YCplac33 GPD1-ER7A and 
YCplac33 gpdr""" -CBS4C were verified using Sanger 
sequencing. URA3 was deleted in the strain 26B gpdlA 
and ER7 A gpdlA. Both strains were mated to obtain the 
diploid strain ER7A/26B gpdlA/A ura3A/A. The deletion 
cassette was constructed as described by Gueldner et al. 
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[46] with the geneticin resistance marker KanMX6. 
Transformants were selected simultaneously for the two se- 
lectable markers on phleomycin and geneticin, to avoid a 
cassette switch. URA3 gene deletion was confirmed by 
PCR and absence of growth on SD-ura plates. YCplac33 
GPD1-ER7A, YCplac33 gpdl'-^^''-CBS4C and the empty 
plasmid were transferred to the strain BY4742 gpdlA, 26B 
gpdlA ura3A, ER7A ^dlA ura3A, and YR7 gpdlA/A 
um3A/A, using the LiAc/PEG transformation method [47]. 

Data deposition 

Sequencing data have been deposited at the SRA data- 
base (NCBI), http://www.ncbi.nlm.nih.gov/sra, with the 
account number SRA059109. 

Abbreviations 

OD: Optical density; YP: Yeast extract peptone; YPD; Yeast extraa peptone 
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hemizygosity analysis; GPDH: Glycerol-3-phosphate dehydrogenase; 
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